263 research outputs found

    Neural networks and support vector machines based bio-activity classification

    Get PDF
    Classification of various compounds into their respective biological activity classes is important in drug discovery applications from an early phase virtual compound filtering and screening point of view. In this work two types of neural networks, multi layer perceptron (MLP) and radial basis functions (RBF), and support vector machines (SVM) were employed for the classification of three types of biologically active enzyme inhibitors. Both of the networks were trained with back propagation learning method with chemical compounds whose active inhibition properties were previously known. A group of topological indices, selected with the help of principle component analysis (PCA) were used as descriptors. The results of all the three classification methods show that the performance of both the neural networks is better than the SVM

    Features based text similarity detection

    Get PDF
    As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, in handling some large content articles, there are some weaknesses in fingerprint matching technique especially in space and time consumption issue. In this paper, we propose a new approach to detect plagiarism which integrates the use of fingerprint matching technique with four key features to assist in the detection process. These proposed features are capable to choose the main point or key sentence in the articles to be compared. Those selected sentence will be undergo the fingerprint matching process in order to detect the similarity between the sentences. Hence, time and space usage for the comparison process is reduced without affecting the effectiveness of the plagiarism detection

    A Review of using Data Mining Techniques in Power Plants

    Get PDF
    Data mining techniques and their applications have developed rapidly during the last two decades. This paper reviews application of data mining techniques in power systems, specially in power plants, through a survey of literature between the year 2000 and 2015. Keyword indices, articles’ abstracts and conclusions were used to classify more than 86 articles about application of data mining in power plants, from many academic journals and research centers. Because this paper concerns about application of data mining in power plants; the paper started by providing a brief introduction about data mining and power systems to give the reader better vision about these two different disciplines. This paper presents a comprehensive survey of the collected articles and classifies them according to three categories: the used techniques, the problem and the application area. From this review we found that data mining techniques (classification, regression, clustering and association rules) could be used to solve many types of problems in power plants, like predicting the amount of generated power, failure prediction, failure diagnosis, failure detection and many others. Also there is no standard technique that could be used for a specific problem. Application of data mining in power plants is a rich research area and still needs more exploration

    Deep Learning Approaches for Big Data Analysis

    Get PDF
    Good representations of data eliminate irrelevant variability of the input data, while preserving the information that is useful for the ultimate task. Among the various ways for learning representation is using deep learning methods. Deep feature hierarchies are formed by stacking unsupervised modules on top of each other, forming multiple non-linear transformations to produce better representations. In this talk, we will first show how deep learning is used for bioactivity prediction of chemical compounds. Molecules are represented as several convolutional neural networks to predict their bioactivity. In addition, a new concept of merging multiple convolutional neural networks and an automatic learning features representation for the chemical compounds was proposed using the values within neurons of the last layer of the CNN architecture. We will also show how the concepts of deep learning is adapted into a deep belief network (DBN) to enhance the molecular similarity searching. The DBN achieves feature abstraction by reconstruction weight for each feature and minimizing the reconstruction error over the whole feature set. The DBN is later enhanced using data fusion to obtain a lower detection error probability and a higher reliability by using data from multiple distributed descriptors. Secondly, we will show how we used deep learning for stock market prediction. Here, we developed a Deep Long Short Term Memory Network model that is able to forecast the crude palm oil price movement with combined factors such as other commodities prices, weather and news sentiments and price movement of crude palm oil. We will also show how we combined stock markets price and financial news and deployed the Long Short Term Memory (LSTM), Recurrent Neural Network (RNN), and Word 2 Vector (Word2Vec) to project the stock prices for the following seven days. Finally, we will show how we exploited deep learning method for the opinion mining and later used it to extract the product's aspects from the user textual review for recommendation systems. Specifically, we employ a multichannel convolutional neural network (MCNN) for two different input layers, namely, word embedding layer and Part-of-speech (POS) tag embedding layer. We will show effectiveness of the proposed model in terms of both aspect extraction and rating prediction performance

    Textual and structural approaches to detecting figure plagiarism in scientific publications

    Get PDF
    The figures play important role in disseminating important ideas and findings which enable the readers to understand the details of the work. The part of figures in understanding the details of the documents increase more use of them, which have led to a serious problem of taking other peoples’ figures without giving credit to the source. Although significant efforts have been made in developing methods for estimating pairwise diagram figure similarity, there are little attentions found in the research community to detect any of the instances of figure plagiarism such as manipulating figures by changing the structure of the figure, inserting, deleting and substituting the components or when the text content is manipulated. To address this gap, this project compares theeffectiveness of the textual and structural representations of techniques to support the figure plagiarism detection. In addition to these two representations, the textual comparison method is designed to match the figure contents based on a word-gram representation using the Jaccard similarity measure, while the structural comparison method is designed to compare the text within the components as well as the relationship between the components of the figures using graph edit distance measure. These techniques are experimentally evaluated across the seven instances of figure plagiarism, in terms of their similarity values and the precision and recall metrics. The experimental results show that the structural representation of figures slightly outperformed the textual representation in detecting all the instances of the figure plagiarism

    Use Data Mining Techniques to Identify Parameters That Influence Generated Power in Thermal Power Plant

    Get PDF
    The goal of this paper is to identify the parameters that influence the amount of power generated by steam power plants. Data mining tools were used to prove that influencing parameters are differ according to the current status of power plant. Waikato environment for Knowledge analysis (Weka) was used for feature selection and building the prediction model. An initial comparison between many algorithms for each data set was reported. Then the prediction model was built using linear regression algorithm, because it shows the highest correlation coefficient between parameters, and minimum errors. The selected model predicts the generated power using all available parameters as predictors. Although this is not a practical method for power prediction, because not all predictors are controllable, but it reflects how much a parameter influence the amount of generated power. Evaluation results of these models were discussed and a detailed analysis sheet was prepared, to prove that data mining is the best way to predict the amount of generated power, and show the health status of steam power plant

    Prediction of Banks Financial Distress

    Get PDF
    In this research we conduct a comprehensive review on the existing literature of prediction techniques that have been used to assist on prediction of the bank distress. We categorized the review results on the groups depending on the prediction techniques method, our categorization started by firstly using time factors of the founded literature, so we mark the literature founded in the period (1990-2010) as history of prediction techniques, and after this period until 2013 as recent prediction techniques and then presented the strengths and weaknesses of both. We came out by the fact that there was no specific type fit with all bank distress issue although we found that intelligent hybrid techniques considered the most candidates methods in term of accuracy and reputatio

    Systematic literature review (SLR) automation: a systematic literature review

    Get PDF
    Context: A systematic literature review(SLR) is a methodology used to find and aggregate all relevant studies about a specific research question or topic of interest. Most of the SLR processes are manually conducted. Automating these processes can reduce the workload and time consumed by human. Method: we use SLR as a methodology to survey the literature about the technologies used to automate SLR processes. Result: from the collected data we found many work done to automate the study selection process but there is no evidence about automation of the planning and reporting process. Most of the authors use machine learning classifiers to automate the study selection process. From our survey, there are processes that are similar to the SLR process for which there are automatic techniques to perform them. Conclusion: Because of these results, we concluded that there should be more research done on the planning, reporting, data extraction and synthesizing processes of SLR
    corecore